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Abstract 

Background: The diagnosis of many diseases can be often formulated as a decision problem; uncertainty affects 
these problems so that many computerized Diagnostic Decision Support Systems (in the following, DDSSs) have 
been developed to aid the physician in interpreting clinical data and thus to improve the quality of the whole 
process. Fuzzy logic, a well established attempt at the formalization and mechanization of human capabilities in 
reasoning and deciding with noisy information, can be profitably used. Recently, we informally proposed a general 
methodology to automatically build DDSSs on the top of fuzzy knowledge extracted from data. 

Methods: We carefully refine and formalize our methodology that includes six stages, where the first three stages 
work with crisp rules, whereas the last three ones are employed on fuzzy models. Its strength relies on its 
generality and modularity since it supports the integration of alternative techniques in each of its stages. 

Results: The methodology is designed and implemented in the form of a modular and portable software 
architecture according to a component-based approach. The architecture is deeply described and a summary 
inspection of the main components in terms of UML diagrams is outlined as well. A first implementation of the 
architecture has been then realized in Java following the object-oriented paradigm and used to instantiate a DDSS 
example aimed at accurately diagnosing breast masses as a proof of concept. 

Conclusions: The results prove the feasibility of the whole methodology implemented in terms of the architecture 
proposed. 



Background 

Making an effective and efficient medical diagnosis is pivo- 
tal in clinical daily practice, clearly because of the impact 
of this singular decision making process in the eventual ill- 
ness trajectory and disease management. For such a rea- 
son, the optimization of the diagnostic process in terms of 
number and duration of patient examinations, with corre- 
sponding accuracy, sensitivity, and specificity, is known to 
reduce morbidity and mortality rates, control costs and 
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improve both doctor-patient and community-facility rela- 
tionships [1]. 

The task of medical diagnosis, like almost any other 
diagnostic process, is made more complex to obtain even 
for a medical expert because of a web of relevant uncer- 
tainties, in the form of information incompleteness, impre- 
ciseness, fragmentariness, not fully reliability, vagueness 
and contradictoriness [2] . Specifically, patients may not be 
able to describe exactly the natural history of their disease 
in terms of what has happened to them or how they feel; 
doctors and health care practitioners may not understand 
or interpret exactly what they hear or observe; the accu- 
racy of available laboratory reports, which may come with 
some degree of error; and the effects of treatment in an 
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individual patient or in a group or population as a whole 
in terms of how diseases alter the normal functioning of 
the body [3]. The need to identify the most accurate medi- 
cal diagnosis in a very timely manner increases dramati- 
cally in the case of mortal diseases, as both the rapid and 
accurate diagnosis and prompt initiation of treatment are 
recognized as necessary conditions to limit further compli- 
cations, cut costs and reduce human suffering. 

In order to improve the possibility of early and accurate 
diagnosis of illness, there is thus the need for the applica- 
tion of Diagnostic Decision Support Systems (in the fol- 
lowing, DDSSs) in the process, because these are known 
to improve practitioner's performance, reduce costs and 
improve patient outcomes [4]. 

The first DDSSs proposed in literature relied on crisp 
models based on thresholding for solving medical classifi- 
cation problems. Nevertheless, they neither take into 
account the fuzziness of input data nor reproduce the 
expert decision-making process applied in a vague-laden 
domain such as medicine. As a matter of fact, the deci- 
sion-making model every trustworthy physician has in 
mind to perform heuristic diagnosis is often pervaded by 
uncertainty and vagueness. 

Recently, DDSSs based on a multi-valued logic and, in 
particular, on Fuzzy Logic, have been applied to medical 
classification problems (see, for example, [5], [6] and [7]) 
demonstrating their capability both to overcome the pro- 
blem of managing imprecise and uncertain knowledge 
and offer a support for the medical decision making pro- 
cess. Moreover, Fuzzy Logic is quite close to natural lan- 
guage and allows presenting the results to clinicians in a 
more natural form. This makes fuzzy based DDSSs more 
acceptable to human users than black box systems, 
because both the semantic expressiveness and reasoning 
(by using linguistic variables and rules) are comprehensi- 
ble and may be validated by human inspection. 

Crisp and fuzzy modeling 

The first DDSSs were mainly based on Crisp Logic, 
which provides an inference morphology for drawing 
conclusions from existing neat and clear-cut information: 
specifically, new truths can be inferred from old ones. 
In more detail, it relies on rules, defined as conditional 
statements written in the following form: if crispAntece- 
dent then crispConsequent where crispAntecedent is a 
crisp-logic expression made of one or more simple predi- 
cates linked by logical operators and depending on input 
crisp variables, and crispConsequent is an expression of 
output variables which are assigned crisp values. A crisp 
variable C (also named feature) is characterized by a set 
of crisp values c it can assume, whereas a predicate P 
classifies the values belonging to a variable into two 
groups or categories, i.e. values that make it either true 



or false, respectively. In other words, the set defined by P 
(c) is written as: {c|P(c)}, and is just a collection of all the 
values for which P is true. For instance, {c\c is a positive 
integer less than 3} is the set {1,2}. 

Crisp modeling is the task of determining the para- 
meters characterizing a rule base, classified into the fol- 
lowings: 

• Structural parameters. Related mainly with the size 
of the rule base, they include the number of variables 
involved in the rules and the number of rules. 

♦ Connective parameters. Related with the topology of 
the rule base, these parameters include antecedents, 
consequents, and weights of the rules. 

Unlike Crisp Logic, Fuzzy Logic resembles human rea- 
soning in its use of vague information to generate deci- 
sions [8], where vague predicates are used and values 
belonging to a variable cannot be classified into two 
groups (either true or false). In this sense, Fuzzy Logic 
incorporates an alternative way of reasoning, which allows 
modeling complex systems using a higher level of abstrac- 
tion originating from knowledge and experience [9]. 

In more detail, in Fuzzy Logic, a fuzzy variable F (also 
named linguistic variable) represents a concept that is 
measurable in some way either objectively or subjectively 
and is defined by a set of fuzzy terms T (also named fuzzy 
sets), and by the membership functions fi T associated to 
these terms; fuzzy terms set a membership value from 0 to 
1 to elements u within a predetermined range U (named 
the universe of discourse) as follows: 

T = {{u, ht)\u € U and p T ■ U -> [0, 1]} (1) 

The central notion, thus, is that truth values or member- 
ship values are indicated by a value on the range [0, 1], 
with 0 representing absolute false and 1 representing abso- 
lute truth. For instance, Figure 1 shows the linguistic vari- 
able Heart Rate made of three terms (low, medium and 
high), and defined in U = [0, 150] bpm (beats per minute). 

The fuzzy inference morphology relies on a Fuzzy Infer- 
ence System (in the following, FIS) based on if fuzzy Ante- 
cedent then fuzzyConsequent rules, where fuzzyAntecedent 
is a fuzzy-logic expression made of one or more simple 
fuzzy expressions linked by fuzzy operators and depending 
on input fuzzy variables, and fuzzyConsequent is an 
expression of the output variables which are assigned 
fuzzy terms. 

Essentially, fuzzy reasoning is made of four steps, namely 
fuzzification of input variables, rule evaluation, aggregation 
of rule outputs and, finally, defuzzification. Fuzzification of 
input variables converts crisp (realvalued) inputs into 
fuzzy terms. For each fuzzy rule, Rule evaluation applies 
such fuzzified inputs to its antecedents, making use of a 
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fuzzy operator in case of multiple antecedents, with the 
final aim of generating a single value indicating its degree 
of activation. Such a value is, then, used in the implication 
to infer the conclusion of the rule. Aggregation combines 
the membership functions of all rule consequents pre- 
viously evaluated in order to generate a single fuzzy set as 
output. Finally, defuzzification determines the best repre- 
sentative crisp value of this aggregated output fuzzy set. 

Fuzzy modeling is the task of determining the para- 
meters of a FIS, classified into the followings [10]: 

♦ Logical parameters. They include the shape of the 
membership functions, the fuzzy logic operators 
applied for AND, OR, implication, and aggregation 
operations, and the defuzzification method. 

♦ Structural parameters. Related mainly with the size of 
the fuzzy system, they include the number of variables 
involved in the rules, the number of membership func- 
tions defined for each linguistic variable, and the num- 
ber of rules. 

♦ Connective parameters. Related with the topology of 
the system, these parameters include antecedents, con- 
sequents, and weights of the rules. 

♦ Operational parameters. These parameters define the 
mapping between linguistic and numeric representa- 
tions of the variables, so characterizing the member- 
ship functions of the linguistic variables. 

The most usual, and cheapest, way for modeling medical 
knowledge in fuzzy-based DDSSs is asking an expert to 
write if-then rules. Moreover, after formalizing the expert's 
knowledge under the form of rules, the designer and the 
expert have to choose the shape and location of member- 
ship functions for all the linguistic values related to all the 
variables involved. This step, sure enough, requires both 
medical expertise and technical intervention along with 
great effort to identify which among the design choices are 
suited to the given problem. Alternatively, an emerging 
solution is represented by data driven fuzzy modeling, that 
is being widely adopted in different application domains to 
automatically generate a rule base from data, even if the 



interpretability is not guaranteed in many situations and 
redundancy can occur in the rules produced. 

According to [11] a fuzzy model is interpretable when: 
(i) the fuzzy terms associated to a variable (usually indi- 
cated as fuzzy partition) are interpretable as linguistic 
labels, (ii) the rule base is as small as possible, and (Hi) the 
if-part of each rule does not includes all the independent 
variables but only a subset of them. 

State of the art 

A number of fuzzy-based DDSSs has faced up these chal- 
lenges, and has attempted to address the subjects of 
knowledge acquisition, representation, and utilization in 
medical diagnosis. 

In [12], a rule-base self-extraction and simplification 
method is proposed, devised to establish interpretable 
fuzzy models from numerical data. A fuzzy clustering 
technique associated with a fuzzy partition validity index is 
used to extract the initial fuzzy rule-base and find out the 
optimal number of fuzzy rules. To reduce the complexity 
of fuzzy models while keeping good model accuracy, some 
approximate similarity measures are presented and a para- 
meter fine-tuning mechanism is introduced to improve 
the accuracy of the simplified model. Experimental results 
are reported with respect to different case studies, such as 
function approximation, dynamical system identification 
and mechanical property prediction for hot rolled steels. 
These test-cases are characterized by a relatively small 
number of input-space variables. No experimental test has 
been reported for problems characterized by a high num- 
ber of input-space variables. 

In [13], an evolving hierarchical fuzzy system based on 
probabilistic incremental program evolution is presented. 
The use of hierarchical fuzzy systems allows to limit both 
the number of rules and the number of fuzzy operations 
with respect to single level systems. Worthy results are 
described for case studies concerning non linear system 
identification, such as Makey-Glass chaotic time series 
prediction, and the Iris and Wine classification. 

In [14] a data-driven innovative approach is presented 
for generating a fuzzy rule based decision support system 
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for the diagnosis of coronary artery disease. The imple- 
mented methodology relies on four stages. 

In the first stage, a decision tree is induced from the 
dataset, while in the second stage, a set of rules is 
extracted from it. This set of rules is in Disjunctive Normal 
Form (DNF) and involves crisp variables modeling neat 
and clear-cut quantities. It can be used as a whole to clas- 
sify new incoming data coherently with the knowledge 
embedded into the initial dataset. In the third stage, the 
crisp model is fuzzified, i.e., the crisp rules are transformed 
into fuzzy ones, using a fuzzy membership function 
instead of the crisp one and definitions of S and T norms. 
Finally, in the fourth stage, the parameters entering the 
fuzzy model are adapted using a global optimization 
technique. 

In [15] a generic methodology is presented for the 
automated generation of fuzzy models. The methodology 
is realized in three stages. Initially, a crisp model is cre- 
ated whereas, in the second stage it is transformed into a 
fuzzy one. In the third stage, all parameters entering the 
fuzzy model are optimized. A specific realization of this 
methodology is implemented, using decision trees for the 
creation of the crisp model, the sigmoid function, the 
min-max operators and the maximum defuzzifier for the 
transformation of the crisp model into a fuzzy one, and 
four different optimization strategies, including global 
and local optimization techniques, as well as hybrid 
approaches. 

In [16] a generic approach to the design of interpretable 
data-driven fuzzy models, which can be used in the con- 
struction of DDSSs, is proposed. The approach addresses 
several design steps, including fuzzy partitioning, rule learn- 
ing, variable selection and rule base simplification. The 
fuzzy partitioning step consists in generating a collection of 
fuzzy partitions of various sizes from two to a user-defined 
maximum value, based upon indices or an objective func- 
tion. The rule learning step includes two categories of 
methods, namely region based methods and prototype 
based ones. The rule base simplification merges some rules 
into a more generic incomplete rule, where some variables 
(one or more) appear in some rules. The variable selection 
determines the number of terms for a given variable neces- 
sary to get a good rule base, in terms of trade-off between 
its complexity and accuracy, measured by performance 
indexes. 

The contribution of the work 

Recently, we informally introduced [17,18] a methodol- 
ogy to design and develop a fuzzy-based DDSS for medi- 
cal classification problems by extracting fuzzy knowledge 
from data. In this work, we first propose a formalization 
of a refined and assessed version of such a methodology, 
which essentially specifies a flow of stages needed to 
develop a fuzzy-based DDSS as well as the characteristics 



of the input and output produced and consumed in the 
different stages. As a result, it formalizes the role 
assumed by each stage in terms of its interface, but it 
does not provide any indication about how the single 
activities have to be done in the form of strategies to be 
adopted or algorithms to be applied, especially because 
this choice is strictly linked to the specific application 
domain. This issue is further motivated by the fact that 
the methodology, whose strength relies on its generality 
and modularity, has been thought as a basis for the devel- 
opment process of fuzzy-based DDSSs by supporting the 
integration of alternative techniques in each of its stages. 
Both the generality and flexibility make it applicable to 
almost any medical classification domain and, also, 
enable the possibility to test the efficiency of different 
methods in order to detect their best integration with 
respect to specific classes of problems. 

In more detail, the assessed version of the methodology 
(Figure 2) includes six stages: (i) extraction of crisp rules, 
(ii) selection of a significant partition from the extracted 
rule set, (Hi) reduction of the selected rule set, (iv) crea- 
tion of fuzzy rules, (v) generation of the whole fuzzy 
inference system and (vi) its optimization. In the first 
three stages a set of crisp rules is initially created and, 
then, appropriately elaborated in order to be compliant 
with some characteristics, which are strongly necessary 
for this methodology to make the fuzzification feasible. 
In this work, such characteristics are formulated with the 
definition of fuzzifiability. Successively, the last three 
stages are in charge of (i) transforming the (fuzzifiable) 
crisp rules into the corresponding fuzzy versions, i.e. in 
terms of connective and structural parameters, (ii) defin- 
ing the most appropriate logical parameters to be used in 
the fuzzy inference system and (Hi) optimizing all the 
operational parameters composing the fuzzy rules and, if 
required, also the relative relevance of each of them, spe- 
cified in the form of a weight. 

The methodology has been realized in the form of 
a modular and portable architecture according to a 
Component-Based Software Development (in the follow- 
ing, CBSD), with the aim of defining a collection of com- 
ponents customizable or extensible by existing available 
solutions that are compatible to the original placeholders. 

The architecture has been developed in Java according 
to the object-oriented paradigm in order to create a truly 
portable DDSS, solving the problem of having parts of it 
implemented for different platforms. The resulting archi- 
tecture can be considered as well-suited for almost any 
medical domain where the real world is simulated in 
a broad sense and a diagnosis in terms of classification 
is required. 

As a proof of concept, such an architecture has been 
used to instantiate a DDSS example aimed at accurately 
diagnosing breast masses starting from the widely used 
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Figure 2 Our six-step methodology in terms of activity diagram 



Wisconsin Breast Cancer Dataset (in the following, 
WBCD) to evaluate the feasibility of the methodology. 

The manuscript is structured as follows: in Methods, 
the proposed methodology is formally described and the 



choices and techniques which are identified for the speci- 
fic realization are analyzed. Furthermore, the design 
approach used for the development of the proposed 
architecture is introduced. In Results, the architecture 
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designed is diffusely described and a summary inspection 
of the main components is reported as well. Moreover, 
the proof of concept DDSS for validating the methodol- 
ogy is introduced and its results are discussed. Finally, 
conclusions and on-going activities are outlined in the 
last section as closure to the paper. 

Methods 

Our methodology includes six stages, where the first 
three stages work with crisp models, whereas the last 
three ones are employed on fuzzy models (Figure 2). 

Rule extraction 

The Rule Extraction stage is essentially devised to the 
extraction, from a specific input dataset, of a collection of 
if-then rules constituting the crisp rule base (in the fol- 
lowing, CRB), specifically represented in a weighted Dis- 
junctive Normal Form (wDNF). We work with this 
representation because of its high degree of compactness 
and knowledge synthesis. The CRB, thus, is a disjunctive 
system of crisp rules, where at most one rule must be 
satisfied by an item of the initial dataset, i.e. the rules are 
linked by mutually exclusive or connectives. More for- 
mally, each crisp rule ^ (note that the superscript c is 
used to label a rule as crisp) in the CRB denotes the i 
rule which predicts the / class, with / = 1... M and i = 
1... Nj, where M is the total number of classes and Nj is 
the total number of rules that predict the / class, respec- 
tively. As a result, in general, there could be one or more 
than one rule for each class predicted, while each class is 
likely to be covered by at least one rule in the CRB. 

The structure of each rule r Jj is composed of a conjunc- 
tion of antecedent predicates (which can be evaluated to 
be either true or false), based on the set of features 
X c = {x^}, with / = 1... L (where L is the number of fea- 
tures of the initial dataset), and one consequent term 
indicating the specific class predicted. More in detail, for 
the i th rule which predicts the f class, given the sets of 
crisp predicates Pjj- = (Pp), with k = 1... Ky (where Ky is 
the total number of predicates for i th rule which predicts 
the f class), and let Yj be the consequent crisp term 
representing the predicted f class, its formulation is 
defined as: 

4:p^(X c )A....Ap^(X c )^}f ( 2 ) 
with each crisp predicate expressed as: 

m* c ) - (v^y ( 3 ) 

where ^ is the specific feature selected from the set X°, 
op^fe is a comparison operator selected from the sets {=, 
and {<, >, <, >} in the cases of categorical and numerical 
features, respectively, and ^ represents a categorical value 
or a crisp numerical threshold. 



This DNF is labeled as weighted since each rule is asso- 
ciated with a degree of relevance, such as its coverage or 
accuracy, with respect to its predicted class depending on 
the domain of application and the specific dataset con- 
cerning this domain. 

Different solutions can be adopted to extract rules, ran- 
ging from purely logical approaches to statistically-based 
ones or relying on artificial neural networks, genetic algo- 
rithms and on non-connectionist machine learning (e.g. 
decision trees) [19]. Independently from the specific 
method used, it is relevant to point out that it is possible 
to extract rules able to correctly classify an item in the 
dataset from its known features, i.e. every item in the 
input dataset is covered by exactly one rule in the CRB, 
but without avoiding the possibility of overfitting the 
input dataset which can be characterized by a degree of 
uncertainty. This uncertainty may arise from two differ- 
ent sources. The first is mis-measurement, i.e., for a vari- 
ety of reasons, the value of a feature or class may be 
incorrectly measured. The second source of uncertainty 
is the occurrence of extraneous factors not recorded, but 
affecting the results so that the class of an item in the 
dataset cannot be determined wholly from its recorded 
features. The resulting CRB extracted in these situations 
tends to be very large and many rules reflect particular 
items in the training dataset which are very unlikely to 
occur in further examples, i.e. they cover a very small 
part of the input space, are matched only by a few exam- 
ples, lack generality and can become counter-productive. 
This issue represents the motivation for the second stage, 
i.e. Selection. 

Selection 

The Selection stage is in charge of determining the suffi- 
cient number of rules, as are necessary to get a good CRB 
with respect to the specific dataset concerning the 
domain under observation, where a good CRB represents 
a reasonable trade-off between complexity, determined 
by the number of rules, and accuracy, measured by 
appropriate performance indexes. The selection is done 
with the idea of granting two main factors emerged as 
primary determinants of interpretability. First, the num- 
ber of rules should be small so involving that a full set of 
complete rules should be avoided since it can quickly 
lead to a combinatorial explosion when the number of 
features rises [16]. The second condition is strictly linked 
to the first one and is specific to complex systems with a 
large number of features: rules must not systematically 
include all the features, but only the important ones in 
the context of the rule, so generating the often called 
incomplete rules [16]. 

Different methods can be applied to perform the rule 
selection and, thus, implicitly also the variable selection, 
each of them exploiting ad-hoc heuristics guided by 
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user-defined indicators (such as the final number of rules 
to be chosen, from a minimum of one for each predicted 
class to a maximum which can be properly indicated by 
the user) and suitable with respect to the different applica- 
tions and their specific requirements. For example, the 
choice of the good CRB can be piloted by a longest-match 
criterion, i.e. depending on the longest left-hand-side 
(LHS) of the rules that match an item of the dataset. Its 
rationale is based on the conclusion that longer antecedent 
part will contain more accurate and richer information for 
the final classification than the shorter ones. Differently, a 
most confident selection could be adopted, by identifying 
the rules with the highest confidence as the best one, 
where the longer rule is chosen in case of a tie. The ratio- 
nale of this criterion is based on the assumption that the 
testing dataset will share the same characteristics as the 
training dataset used in the Rule Extraction. So if a rule 
has a high confidence in the training data, then this rule 
will also show a high confidence in the testing data, which 
means the class predicted by that rule will be most likely 
to occur next. 

Summarizing, this stage has a double relevance in terms 
of the functionalities offered: first, it performs the rule 
selection to achieve a smaller number of more general 
rules with the idea that it may have greater predictive 
accuracy on unseen data, at the expense of no longer cor- 
rectly classifying some of the items in the original training 
dataset; then, it implicitly carries out also a feature selec- 
tion leading to an incomplete rule base which takes into 
consideration only those features that are really required 
since included into the significant rules previously 
selected. 

It is worth noting that the elimination of features in 
order to obtain incomplete rules could be undertaken at 
the extraction level since the Rule Extraction step can 
remove features from the whole rule, for example using 
statistical indexes. Differently, this stage is intended to 
select only a subset of the features instead of the whole set 
of candidate ones for other reasons: i) it can be cheaper to 
measure only a reduced set of features; ii) prediction accu- 
racy may be improved through exclusion of poorly signifi- 
cant features; Hi) the final DDSS to be built might be 
simpler and potentially faster when fewer input variables 
are used; iv) knowing which features are more relevant 
can give insight into the nature of the classification pro- 
blem and allow a better understanding of the final DDSS. 
At this point, in the context of the same rule r« given as 
output after the Selection, with respect to numerical fea- 
tures, different predicates can contain the same feature 
selected from the set X c , i.e. = rfjk with £ k e U> •••> Kij\ 
and / *■ k. This consideration represents the motivation for 
the third stage, Reduction, 



Reduction 

The Reduction stage is in charge of simplifying the struc- 
ture of each crisp rule in order to make it compliant with 
some characteristics, which are strongly necessary for 
this methodology to make the fuzzification feasible. In 
this work, such characteristics are formulated with the 
definition of fuzzifiability. A crisp rule is defined as fuzzi- 
fiable, if and only if each of its numerical features appears 
in one or at most two predicates in its antecedent part, 
according to one of the following forms: 

pj(x c ) = (* C < (4) 
f n (xF) = {x c > xf) (5) 

p c m {x c ) = [x c < tf[ a x c > if 2 with \f x > if 2 ) (6) 

This stage thus puts into effect a simplification proce- 
dure, that iteratively searches, in the context of each rule, 
each couple of predicates involving the same feature, 
using comparable operators and needed of being made 
compatible with the fuzzifiability. It is important to point 
out the meaning of comparable operators. Two compari- 
son operators are intended as comparable in this proce- 
dure if and only if they appear in one of the situations 
reported in 7, independently of their order: 

comp = {op\ =>' Aop c 2 ='>') v {op\ ='<' Aop c 2 ='<'$7) 

When a couple of candidate predicates is detected, 
since in the rule under evaluation they are connected by 
conjunctions (remember that this is due to the fact that 
the CRB is in DNF), they can be reduced as follows: 

(x'op'ji/;) a [3foffyf 2 ) -» x' > mo*(i/5,t4) if [pp\ £{>,>) AopJ e {>,>!) (8) 
(x c op\if 1 ) a {jfoffyQ -» x? < min{if v tf 2 ) if {°p\ e Is, <) a op'j e {<, <)) (9) 

After applying this simplification procedure to every 
couple of candidate predicates, each resulting predicate 
will be formulated according to one of the forms defined 
in equations 4-6. A clarifying note has to be reported 
about the categorical features. As they assume mutual 
exclusive values and, in each rule, the antecedent predi- 
cates are admitted to be connected only by means of con- 
junction operators, it is not possible (since meaningless) at 
all that two different predicates might contain the same 
feature assuming different values in the context of the 
same rule. Thus, the Reduction does not involve the cate- 
gorical features in its simplification procedure. 

Fuzzification 

After the first three stages, a crisp model made of rules 
based on clear-cut boundaries has been generated in 
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accordance with the fuzzifiability property. Successively, 
starting from the crisp model produced, the generation 
of the fuzzy model begins with the fourth stage, named 
Fuzzification. 

In more detail, in order to create fuzzy variables and 
terms, in this stage, crisp rules are translated into a corre- 
sponding fuzzy version, where every feature contained in 
the crisp rules is associated to a linguistic variable. It is 
relevant to mention that only features determined rele- 
vant to the classification by means of the previous stages 
are fuzzified. Of course, also the predicted class is asso- 
ciated to a linguistic variable. Successively, each linguistic 
variable is further characterized by a set of terms subjec- 
tively describing it. With respect to this issue, the degree 
of detail to be used in partitioning the universe of dis- 
course of each variable, i.e. number of linguistic terms to 
be referred to it, has been chosen in accordance with the 
crisp model. Summarizing, the fuzzy rule base (in the fol- 
lowing, FRB) achieved in this stage continues to be a dis- 
junctive system of rules. Nevertheless, differently from 
the crisp case, where exactly one rule must be satisfied, 
the fuzzy rules are linked by simple OR connectives, 
where at least one rule must be satisfied, i.e. one or more 
rules may be weakly or strongly activated simultaneously. 

Each fuzzy rule rt (note that the superscript /is used to 
denote a rule as fuzzy) in the FRB denotes the i th rule 
which predicts the / class, with /' = 1... M and i = 1... Nj, 
where M is the total number of classes and Nj is the total 
number of fuzzy rules that predict the f h class, respec- 
tively. The structure of each fuzzy rule rjj is composed of 
a conjunction of antecedent fuzzy predicates, based on 
the set of linguistic variables x? = {x u }> with u = 1, U c , 
(where U c is the number of features effectively used in 
the crisp model), and one consequent fuzzy variable indi- 
cating the specific class predicted. More formally, for i th 
rule which predicts the / class, given the sets of fuzzy 

predicates ff- = {p^}, with k = 1... K tj (where K t j is the 

total number of predicates for i th rule which predicts the 

f h class), and let be the consequent linguistic variable 

representing the predicted f class, its formulation is 
defined as follows: 

r[ ; : ^(tfjA.-.A^xO^yf (10) 

with each fuzzy predicate expressed as: 

where uL represents a (fuzzified) numerical interval in 
the case x { - k is a fuzzy variable associated to a numerical 

feature or a (fuzzified) categorical value in the case xj^ is 
fuzzy variable associated to a categorical feature. Finally, 



also the FRB is weighted since each fuzzy rule is associated 
with the same degree of relevance pertaining the crisp rule 
it has been generated from. 

Different methods can be applied to fuzzify crisp rules 
[14], [15], [20], [21], ranging from solutions which, on 
the one hand, exploit the symbolic structure of the crisp 
rules to generate fuzzy variables and terms to be 
inserted into the predicates of the corresponding fuzzy 
rules and, on the other hand, soften the sharp crisp 
thresholds to minimise continuous terms close to the 
decision boundaries from misclassification, to 
approaches where the crisp rule structure is only used 
to define fuzzy variables and terms and the sharp 
thresholds are not taken into account at all into the fuz- 
zification process. 

FIS configuration 

At this point, after generating the fuzzy model in terms 
of rules, linguistic variables and terms, the overall Fuzzy 
Inference System underpinning the DDSS has to be gen- 
erated in the fifth stage, named FIS Configuration, 
depending on the domain of application and its specific 
requirements. As depicted in Figure 3, a FIS is a system 
aimed at solving a typically complex and nonlinear pro- 
blem by utilizing fuzzy logic methodologies. Its basic 
structure includes four main components, namely a Fuz- 
zifier (which translates real-valued inputs into fuzzy 
values), an Inference Engine (that applies a fuzzy reason- 
ing mechanism to obtain a fuzzy output), a Defuzzifier 
(to translate this latter output into a crisp value), and a 
Fuzzy model (containing fuzzy rules, linguistic variables 
and membership functions). 

Connective and structural parameters of the FIS (to be 
generated) have been thus defined in the previous stage. 
Differently, in this stage, the most appropriate logical para- 
meters to be used in the FIS have to be determined by the 
designer based on experience and depending on the 
domain characteristics. Typical choices for the reasoning 
mechanism are Mamdani-type, Takagi-Sugeno-Kang 
(TKS)-type, and Singleton-type. Common fuzzy operators 
are min, max, product, probabilistic sum, and bounded 
sum. The most common membership functions are trian- 
gular, trapezoidal, gaussian and bell-shaped. For denazifi- 
cation several methods have been proposed with the 
center of area (COA) and the mean of maxima (MOM) 
methods being the most popular. Moreover, depending on 
the typology of reasoning mechanism desired, different 
Inference Engines can be used, for instance, for supporting 
the rule chaining [22] or operating in accordance with an 
If- Then-Else rule structure [23]. 

FIS adaptation 

After determining the most appropriate logical para- 
meters, in order to complete the generation of the FIS 
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Figure 3 A Fuzzy Inference System. 



underpinning the DDSS, the operational parameters 
have to be identified in the last stage, named FIS Adap- 
tation, in terms of parameters characterizing shape and 
location of membership functions for all the terms 
related to all the linguistic variables involved in the FRB. 

For what concerns linguistic variables linked to cate- 
gorical features, the determination of shape and location 
of their membership functions is very simple. Since a 
categorical feature is one that has two or more cate- 
gories, but there is no intrinsic ordering to them, they 
can be modeled as independent singular singletons. 
Referring to linguistic variables linked to numeric fea- 
tures, a tuning process considering the whole FRB 
obtained has to be used a posteriori to adjust the mem- 
bership function parameters. The classic way to refine 
the membership functions is to adjust through slight 
modifications their parameters in order to find the local 
or global minimum of a mono/multi objective function 
F(x) opportunely defined, which takes into account spe- 
cific indexes modeling at least three characteristics [24] 
a DDSS should possess. 

First, the performance of a DDSS in performing a diag- 
nosis can be evaluated with reference to the correct classi- 
fication rate (CR), even if the system should jointly provide 
also a numerical value (the confidence x) indicating its 
confident in the outcome produced. Furthermore, a DDSS 
should provide the physicians with the possibility of deeply 
understanding how this outcome has been generated 
(interpretability), in order to increase its trustworthiness 
and not to appear as a black box that produces unintelligi- 
ble outputs. It is worth noting that these characteristics 
can often result conflicting. 

This tuning process can be implemented using para- 
meter adjustment algorithms, such as Neural Networks 
algorithms, and in most cases, the gradient of a cost func- 
tion with respect to each adjustable parameter can be cal- 
culated and the parameters can be updated accordingly. 
There are also some derivative -free optimization such as 
Genetic Algorithms and Random Search Methods. As a 



concluding remark, it is worth noting that in this stage not 
only the operational parameters can be adapted, but also 
the weights of each rule in the FRB, which have been pre- 
viously extracted in the first stage and associated to the 
fuzzy rules in the fourth stage, can be refined. Such a way, 
it is possible to induce a better cooperation among rules 
and to more accurately modulate the firing strength of a 
rule in the process of determining the output class. 

Implementing the methodology 

The proposed methodology has been realized in the form 
of a modular and portable architecture according to a 
CBSD approach, with the aim of defining a collection of 
replaceable (and reusable) components characterized by 
a functional cohesion (i.e. the single component performs 
a well-defined set of functions) and a low degree of cou- 
pling in terms of composition and interaction between 
them. 

The choice of a CBSD approach is based on the idea that 
each single component can be not only implemented from 
scratch but, in particular, also customized or extended by 
existing available solutions that are compatible to the ori- 
ginal placeholders. Such a way, the CBSD can significantly 
reduce development effort and time-to-market, and 
improve maintainability, reliability and overall quality of 
final architecture designed. The architecture is developed 
in Java according to the object-oriented paradigm in order 
to create a truly portable DDSS, solving the problem of 
having parts of it implemented for different platforms. 

In the following section, the architecture designed will 
be diffusely described and a summary inspection of the 
main components in terms of UML class diagrams will be 
reported as well. 

Results 

The proposed architecture 

The CBSD approach has generated an extensible and 
layered architecture. An extensible architecture has been 
necessary because the proposed methodology is intended 
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to support the realization of DDSSs according to both 
general-purpose and special-purpose application needs. 
Special-purpose requirements need to be incorporated 
depending on specific medical scenarios, whereas, gen- 
eral-purpose mechanisms will be common across all 
applications. Moreover, it has been conceived as layered 
since, such a way, it can support design based on increas- 
ing levels of abstraction, thereby partitioning the overall 
design problem into several sub-problems. Plus, it sup- 
ports enhancement and reuse since, assuming that the 
interfaces between the layers do not change, it allows for 
changes to occur within the layer in relative isolation 
without impacting the other layers. This improves the 
scalability of the architecture as well as quality and testa- 
bility. Thus, such an architecture can lead to standard 
interfaces for each layer and its components, so that layer 
implementations can be re-used across different DDSSs. 
The system architecture adopted is shown in Figure 4. 

The architecture provides for three different layers: a 
Process Layer, which coordinates the activity flow foreseen 
by the methodology in terms of two loosely coupled sub- 
processes, a Method Layer which handles highly cohesive 
and well-defined operations (from a functional perspec- 
tive) to be done in each sub-process, and a Model&Data 
Layer which manages the data structures used to read 
and store crisp and fuzzy knowledge bases as well as data 
repositories. Moreover, in accordance with the CBSD 
approach, the architecture highlights a transversal layer, 
shared between the Method Layer and the Model&Data 
Layer, which reflects the idea that each single component 
at these two layers can be implemented not only from 
scratch but also by re-using or customizing existing avail- 
able solutions, such as data mining software packages 
(e. g. WEKA [25]) or fuzzy logic libraries (e. g. jFuzzyLogic 
[26] or XFuzzy [27]). 

The definition of the components for the process layer 
has been guided by a top-down approach, where the 
methodology, which can be seen as a high level business 



process, has been divided into two smaller, more manage- 
able and loosely coupled sub-processes. The need for a 
process modularization has been pushed by different moti- 
vations. First, the proposed methodology is large and com- 
plex, and it becomes very difficult to navigate, understand, 
debug issues and track changes in its implementation. 
Moreover, referring to the stages of the methodology, the 
clear and well-defined separation between the crisp and 
fuzzy domains suggests a natural strategy of modulariza- 
tion from a logical and functional perspective. As a result, 
the methodology has been modularized in two sub-pro- 
cesses, one for each domain, so as to contextually balance 
the need of performance and manageability versus the 
need of reuse as well. The first sub-process, handled by 
the Crisp Rule Generator, is devised to generate a set of 
fuzzifiable crisp rules starting from an input dataset, 
whereas the second one, managed by the FIS Generator, is 
thought to produce a whole FIS starting from the output 
of the first sub-process. Both these components assume 
the role of coordinators with respect to the activity flow 
foreseen by each single sub-process. They coordinate all 
the components of the method layer by activating their 
functionalities, examining and validating their results, and 
continuing the handling of the respective sub-process 
accordingly. 

The Method Layer contains the building block compo- 
nents for implementing each single task foreseen in both 
the crisp and fuzzy sub-processes. These components are 
implemented on top of the Model&Data Layer in terms 
of a collection of modules which accesses the respective 
knowledge bases and data repositories, elaborates such 
an information and stores the results into the knowledge 
bases again. Moreover, all these components can be 
opportunely specialized in order to support different cri- 
teria, also by wrapping or utilizing existing tools. 

More in detail, the Crisp Rule Extractor (Figure 5, left) 
is the component in charge of first extracting a collection 
of crisp rules from a specific Data Repository and then 
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storing it into the Crisp Knowledge Base. The extraction 
procedure can involve the strict collaboration with a 
Crisp Inference Engine, which is a component responsible 
of evaluating the firing of a rule with respect to a specific 
input data item. 

The Crisp Rule Selector (Figure 5, right) is responsible of 
getting the extracted rules from the Crisp Knowledge Base, 
determining the best n rules (« is a user-defined para- 
meter) to be successively used with respect to a specific 
criterion and finally storing such a new rule set into the 
Crisp Knowledge Base again. Since the criterion applied 
for the selection might require the identification of the set 
of rules with the highest confidence, such a component 
can interact with the Crisp Inference Engine to evaluate 
how each rule works on the dataset gathered from the 
Data Repository. 

Last, the Crisp Reductor (Figure 5, center) is devised to 
get the selected rules from the Crisp Knowledge Base, sim- 
plify their structure in order to grant the fuzzifiability and 
finally store the reduced rules into the Crisp Knowledge 
Base again. 

The components of the method layer described until 
this point cooperate among them to model the first sub- 
process, and, thus, they are all coordinated by the Crisp 
Rule Generator. The remaining components belonging to 
the method layer act together for bringing in realization 
the second sub-process, and, thus, they are all coordi- 
nated by the FIS Generator. 

More precisely, the Fuzzifier is designed to get the 
reduced crisp rules as well as the description of each single 
feature involved in them (e.g. in terms of range of admissi- 
ble values) from the Crisp Knowledge Base, fuzzify them 
into fuzzy rules, linguistic variables and terms and store 
the results into the Fuzzy Knowledge Base, so as to actually 
realize the fourth stage of the methodology (see Figure 6, 
left). In detail, such a component makes in practice the 
specific fuzzification criterion described at the end of the 
Methods section. 

Referring to the fifth stage of the methodology, the com- 
ponent involved is the Fuzzy Configurer (see Figure 6, 
center), which is responsible of setting the most appropri- 
ate logical parameters to be used for the construction of 



the final FIS, opportunely determined by the designer 
based on experience and depending on the domain char- 
acteristics. Moreover, depending on the typology of rea- 
soning mechanism desired, the specific Fuzzy Inference 
Engine, aimed at performing a fuzzy inference to obtain a 
fuzzy output, is also configured and instantiated. 

The component engaged for the realization of the last 
stage of the methodology is the Fuzzy Adapter (Figure 6, 
right), which is mainly devised to first get the partially 
defined FIS from the Fuzzy Knowledge Base, successively 
tune its logical parameters, in terms of parameters char- 
acterizing shape and location of membership functions, 
and rule weights, and finally store the completely defined 
FIS into the Fuzzy Knowledge Base again. Such a compo- 
nent strictly interacts with the Fuzzy Inference Engine 
since every optimization strategy requires the evaluation 
about how each fuzzy rule works on the dataset gathered 
from the Data Repository. 

Finally, the Model&Data Layer is responsible for the 
management of the structures for inserting and gathering 
information, respectively into and from both the Crisp 
and Fuzzy Knowledge Bases and, in addition, for acces- 
sing the disk-based data structures used by the data repo- 
sitories. In particular, with respect to the data format in 
the repositories, comma-separated values (CSV) are used 
to store tabular data (numbers and text) in a plain-text 
form. In particular, the first row contains the attribute 
names (separated by commas) followed by each data row 
with attribute values listed in the same order (also sepa- 
rated by commas). This choice is due to the fact that 
many data repositories or spreadsheet applications save 
or export data into flat files in this format. 

The design of this architecture in terms of software 
classes has been depicted as UML class diagrams, where 
each class has been summarily outlined below in terms of 
only the most significant public operations defined, with 
respect to the three different modules. In order to better 
facilitate the reading of the diagrams, note that a solid line 
models a structural association between two classes, 
whereas a broken line indicates a functional dependency 
between them. The first diagram includes the main classes 
devised for implementing the architectural components, 
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across the three layers, which operate in the crisp domain 
and are responsible of realizing the first three stage of the 
methodology (Figure 7, top). The second one includes the 
main classes realizing the components which operate in 



the fuzzy domain and are in charge of implementing the 
last three stage of the methodology (Figure 7, middle). The 
last one includes the main classes realized for the manage- 
ment of the data repositories (Figure 7, bottom). These 
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classes also offer facilities for automatically partitioning 
the dataset into learning and testing sets, in order to sup- 
port the K-fold cross validation method. 

Proof of concept: a DDSS for diagnosing breast masses 

The architecture has been used to instantiate a DDSS 
example aimed at accurately diagnosing breast masses 
starting from the widely used Wisconsin Breast Cancer 
Dataset (in the following, WBCD) to evaluate the feasi- 
bility of the methodology. 

This dataset was computed from fine needle aspiration 
(FNA) of a breast mass through image processing and 
was collected at the University of Wisconsin; it can be 
obtained from UCI (University of California at Irvine) 
machine learning repository. The samples contain fea- 
tures describing characteristics of the cell nuclei present 
in the image. The version of WBCD used consists of 10 
features obtained from FNA, namely radius, texture, peri- 
meter, area, smoothness, compactness, concavity, con- 
cave points, symmetry, fractal dimension. Each feature is 
represented with 3 values, namely the mean, standard 
error and the worst or largest (mean of the three largest 
values), but only the mean value was taken into account. 
The two outputs are benign and malignant. All the 
instances were properly recorded without any missing 



value. The diagnosis class is distributed with 357 benign 
samples and 212 malignant samples. 

The architecture was instantiated for creating the DDSS 
as described in the following. Preliminary, it is worth not- 
ing that the tenfold cross validation method was used for 
the assessment of such a DDSS and the classification rate 
was chosen as metric to evaluate the goodness of the final 
results achieved. Such a way, the whole methodology was 
tested for its validation, since the goodness of the final 
results was considered as proof of feasible and efficient 
integration of different methods according to its activity 
flow in order to obtain fuzzy-based DDSSs. As a result, 
the DDSS is described below with respect to each stage of 
the methodology, by reporting the partial results calcu- 
lated only for a fold for the sake of brevity. 

Crisp rules were extracted as equally weighted from a J48 
decision tree, induced by "WBCD. Such a method was 
wrapped on the top of its WEKA implementation. Figure 8 
outlines the resulting decision tree and the corresponding 
set of rules, grouped according to the diagnosis class, i.e. 
malignant and benign, respectively and ordered according 
to their coverage, shown in brackets in Figure 8, indicating 
the number of correctly classified instances. 

Successively, a simplification method based on the most 
confident selection with respect to the rule coverage was 



concave points <= 0.05102 

| radius <= 14.97 

| | concave points <= 0.04451: 2(273.0/4.0) 

| | concave points > 0.04451 

| | | texture <= 24.69: 2 (14.0/2.0) 

| | | texture > 24.69: 4 (2.0) 
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concavity <= 0.1122 
| radius <= 13.68: 2 (6.0) 
I radius > 13.68 
| | radius <= 14.81: 4 (4.0) 
| | radius > 14.81: 2 (1.0) 
concavity > 0.1122: 4 (9.0) 
> 20.78: 4 (22.0) 
sr> 98.92: 4(131.0) 



1. If ( (perimeter>98.92) AND (texture>16.39) AND (concave points>0.05102)) then diagnosis = malignant (131) 

2. If ( (texture>20.78) AND (perlmeter<=98.92) AND (texture>16.39) AND (concave points>0.05102)) 
then diagnosis = malignant (22) 

3. If ( (texture<=16.21) AND (concave points>0.07857) AND (texture<=16.39) AND (concave points>0.05102)) 
then diagnosis = malignant (10) 

4. If ( (concavityX>.1122) AND (smoothness>0.09405) AND (texture<=20.78) AND (perimeter<=98.92) 
AND (texture>16.39) AND (concave points>0.05102)) then diagnosis = malignant (9) 

5. If ( (compactness<=0.07165) AND (texture>16.18) AND (radius>14.97) AND (concave points<=0.05102)) 
then diagnosis = malignant (7) 

6. If ( (radius<=14.81) AND (radius>13.68) AND (concavity<=0.1122) AND (smoothness>0.09405) 
AND (texture<=20.78) AND (perimeter<=98.92) AND (texture>16.39) AND (concave points>0.05102)) 
ttien diagnosis = malignant (4) 

7. If ( (texture>24.69) AND (concave points>0.04451) AND (rod us-. -14.971 AND (concave points-. -0.05 1 02)) 
then diagnosis = malignant (2) 

8. If ( (fractal dimension>0.05948) AND (compactness>0.07165) AND (texture>16.18) AND (radlus>14.97) 
AND (concave polnts<=0. 05102)} then diagnosis = malignant (2) 

9. If ( (area>788.5) AND (concave points<=0.07857) AND (texture<=16.39) AND (concave points>0.05102)) 
then diagnosis = malignant (1) 

1. If ( (concave points<=0.04451) AND (radius<=14.97) AND (concave points<=0.05102)) 
then diagnosis = benign (273) 

2. If ( (area<=788.5) AND (concave points<=0.07857) AND (texture<=16.39) AND (concave points>0.05102» 
then diagnosis ■ benign (16) 

3. If ( (texture<=24.69) AND (concave points>0.04451) AND (radlus<=14.97) AND (concave points<=0.05102)) 
then diagnosis = benign (14) 

4. If ( (radius<=13.68) AND (concavity<=0.1122) AND (smoothness>0.09405) AND (texture<=20.78) 

AND (perimeter<=98.92) AND (texture>16.39) AND (concave points>0.05102)) then diagnosis ■ benign (6) 

5. If ( (toxturo<=16.18) AND (radius>14.97) AND (concave points<=0.05102)) then diagnosis ■ benign (4) 

6. If ( (smoothness<=0.09405) AND (texture<=20.78) AND (perimeter<=98.92) AND (texture>16.39) 
AND (concave points>0.05102)) then diagnosis = benign (4) 

7. If ( (fractal dimension<=0.05948) AND (compactness>0.071 65) AND (texture>16.18) AND (radlus>14.97) 
AND (concave points<=0.05102)) then diagnosis = benign (3) 

8. If ( (texture>16.21) AND (concave points>0.07857) AND (texture<=16.39) AND (concave points>0.05102)) 
then diagnosis = benign (1) 

9. If ( (radius>14.81) AND (radius>13.68) AND (concavity<=0.1122) AND (smoothness>0.09405) 
AND (texture<=20.78) AND (perimeter<=98.92) AND (texture>16.39) 

AND (concave points>0.05102)) then diagnosis = benign (1) 



Figure 8 The rules extracted 
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adopted, where the number of rules to be generated was 
fixed to 2, i.e. one for each output class. The selected rules 
(see Figure 9, top), were opportunely reduced by involving, 
in particular, only the rule 2 (see Figure 9, middle). 

With respect to the Fuzzification stage, a method here 
proposed, based on a determinant of interpretability, was 
applied. In more detail, in the case of numeric features, a 
collection of at most three partitions is generated for the 
universe of discourse of each linguistic variable, by consid- 
ering one crisp rule at a time and translating its predicates 
involving numeric features from their crisp forms to the 
corresponding fuzzy representations: 

p\ (x c ) = (x c < if) -> £ = {x f is tf) (12) 
f u {x°) = {x c < v[ a x° > \f 2 ) -> ff H ee [x f is / n ) (13) 

Pu,{x c ) - (* c > -+ 4 = {xf is 4) (14) 

where ^, J and J correspond to linguistic terms (e.g. 
low, medium and high) linearly ordered and with 
boundary regions between successive terms. It is rele- 
vant to highlight that the semantics associated to each 
linguistic term obtained in such a way is strictly applica- 
tion-dependent and is meaningful only in the context of 
the rule for which it has been defined. As a result, this 
method defines as many terms for each linguistic vari- 
able as the different crisp predicates which use the cor- 
responding feature when examining rule by rule. 
Moreover, in such a way, the crisp boundaries expressed 
in each predicate for the numeric features involved are 
discharged. Thus, on the one hand, the same fuzzy pre- 
dicate can be generated from two different instances of 
a specific form of crisp predicate appearing in two dif- 
ferent rules. On the other hand, by working at rule 
level, even if, in more than one rule, many crisp predi- 
cates share a same numeric feature with different crisp 
boundaries, its universe of discourse is not further parti- 
tioned in order to consider the possible sub-partitions 
identified by the overlapping of different crisp intervals. 



So, any relationship that could exist between the rules 
in terms of shared features is not taken into account. 
Both these methodological choices, which can appear as 
a simplification and a loss of generality, are guided by 
the consideration that defining a justifiable, i.e. limited, 
number of terms for each variable is another determi- 
nant of interpretability [16]. Indeed, taking into account 
all the possible overlapping of different crisp predicates 
in terms of numeric intervals, by observing the whole 
CRB, in order to deeply partitioning a shared feature, 
would generate an overfitting of its universe of dis- 
course, compromising the overall interpretability. 

Furthermore, in the case of categorical features, the 
method generates a collection of as many terms for 
each linguistic variable as the crisp predicates occurring 
in the whole rule set which assign a different categorical 
value to the feature used. 

For what concerns the predicted class, since it is also a 
categorical variable, the number of terms associated to 
the corresponding linguistic variable is assumed to be 
equal to the number of different values the predicted 
class can assume. 

As a result, the fuzzification of the reduced rules gener- 
ated a linguistic variable for every feature appearing in 
their antecedent parts, namely perimeter, texture, concave 
points and radius, each of them characterized by at most 
two partitions for the corresponding Universe of Dis- 
course. Moreover, the output class, i.e. diagnosis, was also 
modeled as a linguistic variable assuming two specific 
values, namely malignant and benign. The resulting fuzzi- 
fied rules are outlined in Figure 9, bottom), where linguis- 
tic variables and their terms are indicated with their first 
letter capitalized. 

Concerning the generation of the other parameters 
pertaining the final FIS underpinning the DDSS, the 
terms Low and High generated after the fuzzification for 
the input variables were modeled with piece-wise linear 
membership functions, whereas the terms Malignant and 
Benign for the output variable were represented as single- 
tons. The Singleton-type reasoning mechanism was used, 
where min and max operators were chosen as T-norm 



1. If ((perimeter > 98.92) AND (texture > 16.39) AND (concave points > 0.05102)) then diagnosis = malignant (131) 

2. If ((concave points <= 0.04451) AND (radius <= 14.97) AND (concave points <= 0.05102)) then diagnosis = benign (273) 



1. 

2. 


If ((perimeter > 98.92) AND (texture > 16.39) AND (concave points > 0.05102)) then diagnosis = malignant 
If ((concave points < 0.04451) AND (radius < 14.97)) then diagnosis = benign 




1. 
2. 


If ((Perimeter is High) AND (Texture is High) AND (Concave Points is High)) then Diagnosis is Malignant 
If ((Concave Points is Low) AND (Radius is Low)) then Diagnosis is Benign 



Figure 9 The rules selected (top), reduced (middle), and fuzzified (bottom) 
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and S-norm functions, respectively. Moreover, the min 
operator and the center of gravity singleton methods 
were applied for implementing the implication and 
defuzzification functions, respectively. A not-chained 
Fuzzy Inference Engine was used by wrapping the imple- 
mentation given within the JfuzzyLogic tool. 

Finally, adaptation based on the Deltajump algorithm 
was carried out for optimizing only the membership 
functions linked to the terms of the linguistic variables 
involved into the fuzzy rules obtained. Such an algo- 
rithm was wrapped on the top of the implementation 
provided by the JfuzzyLogic tool. Only a metric based 
on straight mean square error is used for the evaluation 
with respect to the classification rate, while interpret- 
ability [6] and confidence [4] are not considered yet. 
The fuzzy partitions achieved for each linguistic variable 
are outlined in Figure 10. 

These results were finally validated with respect to the 
classification rate measured for the DDSS instantiated 
on the first fold for the WBCD dataset. In more detail, 
they were calculated on both the training and learning 
sets, depending on the rules achieved in the different 
stages of the methodology, i.e. ranging from the crisp 
rules obtained at the end of rule extraction, to their 
selected and reduced version, until the fuzzified rules 
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Concave Points 




Radius 

Figure 10 The fuzzy partitions adapted for each linguistic variable 



before and after their adaptation. The validation results 
are outlined in Figure 11. 

Figures 12 and 13 sketch the GUI implemented for 
facilitating the construction of a DDSS on the top of the 
proposed architecture: the user is asked to specify the 
dataset to be used, the algorithms or techniques to be 
adopted in all the stages of the methodology with all the 
parameters required for their configurations, and, finally, 
the validation method for evaluating the results with 
respect to a specific metric or index indicated (Figure 12). 

Figure 13 reports the results achieved in terms of fuzzy 
rules extracted, linguistic variables and terms involved in 
the rules and, for each term, the corresponding member- 
ship function optimized. Moreover, such results are also 
enriched by means of the values assumed by the quanti- 
tative metrics used for the validation. 

Conclusions 

Having in mind to extend the range of possible users of 
fuzzy-based DDSSs with extensive and easy-to-use facil- 
ities which could considerably reduce the level of knowl- 
edge and experience required to their design and 
realization, this work has first presented a formalization 
of a refined and assessed version of a six-step methodol- 
ogy to design and implement fuzzy-based DDSSs. Its 
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fold: 0 

CR on learning using with all crisp rules = 98.82 
CR on test set using all crisp rules = 94.91 



CR on learning set using selected crisp rules = 90.00 
CR on test set using selected crisp rules = 94.91 



CR on learning set using reduced crisp rules = 89.60 
CR on test set using reduced crisp rules = 93.22 



CR on learning set using the unadapted FIS = 71.76 
CR on test set using the unadapted FIS = 76.27 



CR on learning set using the adapted FIS = 93.72 
CR on test set using the adapted FIS = 96.61 

Figure 11 The results achieved after the validation in terms of classification rate 



strength relies on its generality and modularity since it 
supports the integration of alternative techniques in 
each of its stages. Stages are employed for: (i) the 
extraction of crisp rules, (ii) the selection of a significant 
partition from the whole rule set extracted, (Hi) the 
reduction of the selected rule set, (iv) the creation of 
fuzzy rules, (v) the generation of the whole fuzzy infer- 
ence system and (vi) its optimization. 

Differently from other existing approaches [14], [15], 
the described methodology is extremely flexible and does 
not depend on the typology of fuzzy model to be defined, 
since it enables the design and realization of fuzzy-based 
DDSSs by taking into account many different and often 
conflicting requirements, such as the accuracy maximiza- 
tion or the complexity minimization. In more detail, it 
can not only integrate state-of-the-art rule-induction and 
rule-optimization methods, but also freely choose the 
structural and operational elements of the fuzzy model to 
be used, such as shape of membership functions or the 
t-norm and s-norm connectors as well as the implication 
and defuzzification operators. Moreover, methods that 
use shared fuzzy sets for the rule base [16] are appropri- 
ate within a small size work space with a good coverage. 



Otherwise, in case of a weak coverage the rule base com- 
pleteness is not guaranteed and, when dealing with large 
systems, the number of combinations to manage is huge 
[16]. On the contrary, the proposed methodology is well 
adapted for large work spaces and generates more com- 
pact incomplete rules with only the most locally signifi- 
cant variables, defined successively with a partitioning 
strictly dependent on the rules where they are involved. 

The presented methodology has been realized, according 
to a CBSD approach, in the form of a modular and porta- 
ble architecture that has been carefully described from a 
software engineering perspective. 

This architecture has been conceived to support the 
design of a fuzzy-based DDSS on increasing levels of 
abstraction, thereby partitioning the overall design problem 
into several sub-problems, where each single component at 
every layer can be implemented from scratch or custo- 
mized by existing available solutions. Such a way, it can sig- 
nificantly reduce development effort and time-to-market, 
and improve maintainability, reliability and overall quality 
of final CDSS designed. 

The development of this architecture has been last car- 
ried out by using the Java language since it contains 
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Figure 12 The implemented system: how to configure each stage of the methodology. 
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Figure 13 The implemented system: the validation results with respect to a specific metric indicated 
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several features that argue for it. It is widely distributed 
and has become one of the major programming languages. 
The development kit, including compiler and debugger, is 
freely available on a number of different computer plat- 
forms. The core libraries contain many functions which 
can be used directly and need to not be adopted from 
external libraries, which is not the case in C++ for 
instance. By exploiting Java features and diffusion in the 
user-community, the proposed architecture has several 
unique advantages, e.g. it reduces programming work. 
Thanks to the huge amount of available Java software, in 
fact, it is really easy creating new methods to be added to 
the Method Layer of the architecture without the effort of 
starting from scratch. Such a way, it is possible to exploit 
the richness of quickly incorporating new developments 
made by the active research community which is always 
working in emerging fields. Moreover, due to the use of a 
strict object-oriented approach for the its components, the 
architecture can be used on any machine with Java. 
Indeed, the concept of modularity of code is highly essen- 
tial to increase the level of portability. As a result, any user 
can apply the architecture to implement a DDSS on his 
machine, independently of the operating system. 

As a proof of concept, such an architecture has been 
used to instantiate a DDSS example aimed at accurately 
diagnosing breast masses starting from the widely used 
Wisconsin Breast Cancer Dataset. The results obtained in 
terms of classification rate proved the feasibility of the 
whole methodology implemented in terms of the archi- 
tecture proposed. 

For what concerns the on-going work, knowledge 
representation techniques such as ontology modeling are 
investigated to be exploited in order to better define 
from a semantic point of view the fuzzy variables and 
terms involved into the rules and improve the readability 
and understandability of the whole fuzzy-based DDSS. 
The choice of using Java as programming language will 
be able to facilitate this integration since the most repre- 
sentative tools in the context of knowledge engineering 
are implemented in Java and released as open source 
projects. Moreover, since DDSSs, however, typically have 
unequal classification error costs so that straight CR can- 
not be assumed as a careful measure of the goodness of 
a DDSS, in the future, also the confidence % will be eval- 
uated to be used for selecting a DDSS; in fact, a good 
DDSS should be highly confident with correctly classified 
examples while it should be doubtful with misclassified 
data points. In such a direction, also more sophisticated 
adaptation techniques able to optimize multi-objective 
cost functions will be integrated, so taking into account 
simultaneously CR, the confidence % and the interpret- 
ability. The last important point for future work is to 
integrate the multi-threading and distributed computing 
to speed computations up during the definition and 



the adaptation of the fuzzy-based DDSS by using widely 
available multi-processors and multi-core hardware. 
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